Developing forecasting model for future pandemic applications based on COVID-19 data 2020–2022

Improving forecasting particularly time series forecasting accuracy, efficiency and precisely become crucial for the authorities to forecast, monitor, and prevent the COVID-19 cases so that its spread can be controlled more effectively. However, the results obtained from prediction models are inaccurate, imprecise as well as inefficient due to linear and non-linear patterns exist in the data set, respectively. Therefore, to produce more accurate and efficient COVID-19 prediction value that is closer to the true COVID-19 value, a hybrid approach has been implemented. Thus, aims of this study is (1) to propose a hybrid ARIMA-SVM model to produce better forecasting results. (2) to investigate in terms of the performance of the proposed models and percentage improvement against ARIMA and SVM models. statistical measurements such as MSE, RMSE, MAE, and MAPE then conducted to verify that the proposed models are better than ARIMA and SVM models. Empirical results with three real datasets of well-known cases of COVID-19 in Malaysia show that, compared to the ARIMA and SVM models, the proposed model generates the smallest MSE, RMSE, MAE and MAPE values for the training and testing datasets, means that the predicted value from the proposed model is closer to the actual value. These results prove that the proposed model can generate estimated values more accurately and efficiently. As compared to ARIMA and SVM, our proposed models perform much better in terms of error reduction percentages for all datasets. This is demonstrated by the maximum scores of 73.12%, 74.6%, 90.38%, and 68.99% in the MAE, MAPE, MSE, and RMSE, respectively. Therefore, the proposed model can be the best and effective way to improve prediction performance with a higher level of accuracy and efficiency in predicting cases of COVID-19.


Introduction
The city of Wuhan in the province of Hubei, China is etched in the folds of history for being the first place of the spread of the Coronavirus disease , due to severe acute respiratory syndrome. The World Health Organisation (WHO) on 31 st January was firstly declared that COVID-19 as a "Public Health Emergency of International Concern" [1]. Originally, it was thought that the virus has been derived from a seafood market in Wuhan. However, on 11 January 2020 the genetic sequence of which was overtly shared by China through human-tohuman contacts have driven its rapid spread with a total of 9,129,146 confirmed cases, including 473,797 deaths across the globe until June 24, 2020 [2]. Nonetheless, the COVID-19 pandemic has infected more than 151 million of the humans all over the world and caused 3 million deaths as of May 1, 2021. The countries like USA, Brazil, Russia, Spain, UK, Italy, France, Germany, China, India, Iran, and Pakistan become the most affected from COVID-19. The first few COVID-19 cases were reported in Malaysia on 24 th January 2020 were detected from Chinese tourists entering the country from Singapore [3]. In the early stage, only in single digit of daily cases were reported, however it had increased to 235 by 26 th March [4]. The number of daily cases in Malaysia were continued to rise exponentially hitting around 20,000 by August 2021. The Malaysian government was declared the implementation of the Movement Control Order (MCO), Conditional MCO (CMCO) and Recovery MCO (RMCO) from 18 th March to 12 th May 2020, 13 th May to 9 th June, and 9 th June to 31 st December, respectively. All travelling and socio-economic activities (gatherings for religious and cultural occasions were not allowed) were restricted nationwide to keep new infections at bay and avoid overloading the country's healthcare system during this period. All government and private offices, and education institutions including transport hubs were closed and instructing citizens to stay at home and interstate travelling was banned with fines of up to RM10,000 for violators.
Since WHO declared as the outbreak of COVID-19 as a pandemic, a lot of effort have been attempts not only from government worldwide but effort also from medical institution are committed to finding vaccines and treatments to control the spread of the virus, statistical modelling particularly forecasting on the COVID-19 cases also have been extensively carried out by statisticians and health scientists to support the health system to inhibit the disaster of infection as well. In this scenario, the capability to pinpoint the growth rate more effectively at which the epidemic is spreading is very crucial to fight back and assist the governments mindfulness concerning society planning and policymaking to accurately deal with the consequences of the infection. Thus, the motivation behind this research compared to the existing research work, namely, (i) to develop the forecasting model that more accurate and efficient regarding the spread of COVID-19 in Malaysia, and (ii) to compare the performance of this novel model with ARIMAS and SVM. This model can assist the public health authorities for pre-emptive and preventive planning to curtail the impact of future pandemics.
During pandemic many studies have been carried out through different mathematical and statistical models to predict the spread of the COVID-19 pandemic. One of the most popular and widely time series forecasting models used to analyse and predict the spread of the disease is the ARIMA (p,d, q) model [5][6][7]. Forecasting daily new cases of COVID-19 was a difficult undertaking because the cases were growing daily. In the first wave, the cases of COVID-19 pattern has been continuously increasing for some period then decline. However, for the second wave it seen to be increased again and some of the COVID-19 cases are difficult to predict. In this scenario, a few researchers predict COVID-19 pattern using ARIMA [8][9][10][11][12][13][14][15]. However, ARIMA model have a limitation where it's normally only can handle a linear time series data structure [16]. However, approximations by ARIMA models are inadequate in representing a barrier in time series forecasting for researchers particularly for nonlinear pattern [17]. Despite its superior performance, Support Vector Machines (SVM's) classification performance and classifier's generalisation ability are frequently impacted by the dimension or quantity of feature variables as mentioned by Lee [18] is used. As a sequence of the development of Vector Machines model, this process will be able to provide the accurate and efficient result in any case of prediction. The SVMs, which were first introduced by Vladimir Vapnik in 1995 [19] in the domain of statistical learning theory and structural risk minimization, have been shown to operate well on a variety of forecasting and classification issues. The SVMs could also cope with or address difficulties like nonlinearity, local minimum, and high dimension in which ARIMA model [16,[20][21][22]. SVMs models have recently been used to handle issues such as nonlinear, local minimum, and high dimension. SVMs can ensure higher accuracy for a longterm prediction compared to other computational approaches even in many practical applications. However, single SVM model as single ARIMA model also have some limitation where SVM model only can handle nonlinear data, instead of linear data. With the constrains of a single ARIMA and SVM models as well, in-dept analysis of time series forecasting, hybrid approaches become the best approach to overcome both limitations and it's a very significant impact in numerous fields due to their dynamic nature and capability to predict at a higher level of accuracy, efficiency, and precision. This approach is crucial due to issues that arise in time series forecasting where almost all real-world time series contain both linear and nonlinear correlation patterns between the data. Recently, the hybridization of forecasting methods has been used with great achievement to reach enhanced forecasting accuracy [16,17,[20][21][22][23][24][25][26].
In terms the spread of COVID-19, the hybrid time series model approach is crucial in predicting the impact of COVID-19 outbreak and it has been shown to be successful in predicting COVID-19 [27][28][29][30]. Thus. this study aims (a) to propose the hybrid ARIMA -SVM models approach for produce better forecasting results where its capability to produce the best estimator, i.e., generating small error terms; (b) to investigate the performance of the proposed models by comparing with the ARIMA and SVM models using three daily cases of COVID-19 data in Malaysia which are daily new positive cases, daily new fatalities cases, and daily new recovered cases. In spite of recent advances in time series and in particular in COVID-19, the model building process does not include cases of COVID-19 specifically in Malaysia to assist the authorities in dealing with the spread of this outbreak by producing more efficient, accurate and precise forecast results in the future. Therefore, in this study rather than rely on conventional approaches to deal with the COVID-19 data, this study relies on intelligent-based prediction methods to better predict the future pandemic. According to Moore [31], the scenario for the next likely new pandemic of strain of bird influenza H7N9 virus, or a novel coronavirus. Despite the fact that future outbreaks are inevitable, however, this intelligent-based prediction methods can produce more efficient, accurate and precise forecasts for pre-emptive prevention medicinal procedures by the local health care authorities [32,33]. The model can also be used to predict Coronavirus or bird flu in the future, especially in tropical rainforest countries like Malaysia. Additionally, the intelligent-based prediction methods will produce prediction models that are more accurate, precise, and efficient in predicting the dynamic spread of the virus in the future. Although, the vaccine is currently available and the number of deaths worldwide is low, this model will be useful for making very accurate predictions if similar outbreaks occur in the future. As a result, the spread of COVID-19 can be predicted earlier so that better health facilities can be built, legislative measures can be taken, and economic losses, especially human losses, can be avoided.
The rest of this paper is organized as follows. Details of the method we used to develop our proposed model are discussed in materials and methods. Followed by a brief formulation of the hybrid ARIMA-SVM model used in this study. The performance of our proposed model based on three well-known COVID-19 case datasets is presented in the results and discussion. Finally, we conclude the paper and provide recommendations for further work.

Materials and methods
The ARIMA modelling. The Autoregressive Integrated Moving Average, The ARIMA (p, d,q) model is one of the families in time series forecasting that is commonly used for time series forecasting because of its flexibility with various categories of time series datasets [17]. It also expressly caters to a set of standard patterns in time series analysis, enabling an easy-touse yet powerful way for creating accurate time series predictions However, limitations may occur with pre-assumptions due to the existence of a linear form that is a linear relationship between the future value of the time series with the current value, past and white noise in the model [16-18, 22, 34]. In the ARIMA model, let p and q be the numbers of autoregressive and moving average terms and they are always mentioned in the order of the model while, d be the integer representative of the differential order. The type of ARIMA model with mean, μ is represented mathematically as follows.
where, y t and ε t are the actual value and the random error at time t, respectively. Both are assumed to be independently and identically distributed (iid) with mean 0 and constant variance of σ 2 , ; i (i = 1,2,. . .,q) and θ j (j = 0,1,2,. . .,q) are the model parameters that need to be predicted.

Support vector machines model
The support vector machine (SVM) introduced by Vladimir Vapnik [19] which involves statistical learning theory can better handle larger dimensional data, even with a small number of training examples, and has excellent generalization. Because the models choose limit support vectors from input data, they process data quickly. The SVM regression function is written as follows.
For linear and regressive data set {x i , y i } the function is formulated as follows The coefficient w and b are estimated by minimizing where L ε is called the ε-intensive loss function and is formulated as follows: (

PLOS ONE
By introducing positive slack variable ξ and x * i , Eq (3) can be transformed to the following constrained formulation: When solving the above formula, we always utilize dual theory to convert it into a convex quadratic programming problem. Introducing the Lagrange Eq(5) change into the following term: subject to When the data set cannot be regressed linearly, we also map them to a high dimension feature space and make linear regress. Then the formulation is as follows: subject to Kðx; xÞ is the inner product of feature space and is called kernel function. Any symmetric function that satisfies Mercer condition can be used as Kernel Function [19]. The Gaussian kernel function is specified in this study.
The SVMs were employed to estimate the nonlinear behaviour of the forecasting data set as Gaussian kernels tend to give good performance under general smoothness assumptions [23].

Proposed hybrid models
Despite various time series models presented, the accuracy, effectively as well as precisely of time series forecasting at this time become the fundamental to many decision-making processes. However, those factors do not occur in the ARIMA and SVM models. This also become the most reason why time series forecasting model is crucial, most challenging, and dynamic as well as active research in many fields of studies. ARIMA and SVM models also have achieved success in their linear or nonlinear areas [16,25,26]. However, none of these are generic principles that can be generalized to all situations. Hence, a hybrid strategy that employs both linear and nonlinear modelling skills is recommended. This approach is suggested mainly for improving overall prediction effectiveness. Therefore, there is no research on how to improve the effectiveness of forecasting models conducted especially in the case of COVID-19 in Malaysia.
In this study two motivation for hybrid models. First, a single model of ARIMA and SVM may not be sufficient to identify all the characteristics of the time series. Second, the assumption that either one or both cannot recognize the actual data generating process. Building the hybrid models of this study involved of two parts. Part I about linear autocorrelation composition and follow with nonlinear component in part II. Thus, Where L t and N t signifies the linear composition and the nonlinear component, respectively. These two parts must be approximated based on the data. In the part I, linear modelling become the focus using ARIMA model to model the linear composition. The model from the first model involved the residuals which is the nonlinear interactions, and it cannot be model by linear model, and maybe linear relationship as well. Thus, Let e i signify the residual from the linear model at time t, then whereL t is the predicted value for time t from the estimated relationship in (1) with e t is the residual at time t from the linear model. According to Aisyah, et al., [16] the residual data set after ARIMA fitting will only contain non-linear relationships and can be properly represented by a linear model. Results of first stage which contains the forecast values and residuals of linear modelling then used in Part II. In Part II, the focus is for nonlinear modelling which SVM used to model the nonlinear (maybe linear) relationship occurring in residuals of linear modelling and original data as well. Then, the residual can be calculated using SVM by modelling various configurations as follows: e t ¼ f ðe tÀ 1 ; e tÀ 2 ; ::e tÀ n Þ þ ε t ð11Þ where f is a nonlinear function determined by the SVMs model and ε t is the random errors. Thus, the combined forecast isŷ Eqs (11) and (12) can be identified asN t , therefore the forecasted values can be achieved by summation of linear and nonlinear components Fig 1 shows the functional flowchart of hybrid models In short, the proposed methodology of the hybrid process consists of two parts. In the part I, the ARIMA model is employed to analyse the problem of linear composition. In the part II, a SVM model is developed to model the residuals from part I. Since the ARIMA model in part I cannot handle the nonlinear component of the data, the residuals of linear model will include information about the nonlinearity. The results from the SVM can be treated as forecasts of the error terms for the ARIMA model. The hybrid model utilizes the distinctive feature and strength of ARIMA and SVM model as well in defining various patterns. Therefore, it is more effective to model linear and non-linear patterns separately by using two different models and re-hybridize the forecast results obtained to improve overall modelling and forecasting performance.

Proposed algorithm
Step 1: Three selected time series of COVID-19 cases datasets (1 st of October 2020-4 th of November 2022), namely daily new positive cases, daily new deaths cases and daily new recovered cases are generated in R programming Language Step 2: Every of the generated datasets is defined as fX 1i ¼ x 11; x 12 ; x 13; . . . ; x n1 g, fX 2i ¼ x 21; x 22 ; x 23; . . . ; x 2n g and fX 3i ¼ x 31; x 32 ; x 33; . . . ; x 3n g for daily new positive cases, daily new deaths cases and, daily new recovered cases, respectively. Then, selected the best ARIMA (p,d,q) after checking the autocorrelation function (ACF) plot of ARIMA (p,d,q) residuals. The best fitted value for daily new positive cases is ARIMA (2,1,2), while ARIMA (1,1,2) and ARIMA (0,1,1) for daily new fatalities cases, and daily new recovered cases of COVID-19, respectively.

PLOS ONE
Future pandemic applications based on COVID-19 data 2020-2022 Step 4: Combine the values in step 3 as a set of input variables to get the output y t Step 5: The ARIMA (p,d,q) is defined by the order of q. According to the information in step 4, Vector Machines is carried out to examine the residuals to get the output L t using Rprogramming Language.
Step 6: A fitted value of ARIMA with the hybridization of Vector Machines model is obtained for each sample data. Then, the residuals ε t is generated to obtain the forecasting result,N t Step 7: The framing data split randomly into training data and testing data for further Vector Machines model. Run the Vector Machines procedure using the 'e1071' package in R-Programming Language Step 8: Assume the split data as the processing data and the order q as in Step 5. Therefore, the combine forecast as in Eq (15):Ŷ t ¼L t þN t Step 9: Estimate the model performance using the statistical measurement which are MSE, RMSE, MAE and MAPE.

Forecasting evaluation criteria
In order to evaluate the performance of the proposed hybrid models, the different statistical measurements criteria which followed by [16,17,32], such as MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error), MSE (Mean Squared Error), and RMSE (Root Mean Squared Error) are used. For ARIMA model, normally, the measurement tools such as Akaike's information Criterion (AIC) and the Bayesian information criterion (BIC) have been widely used in time series analysis to determine the appropriate length for distributed lag [16,17]. Therefore, model selection is made based on the model with the smallest value of AIC and BIC to provide measures of model performance which gives the selection of the best ARIMA model. Meanwhile, for the SVMs models, three parameters such as γ, C and ε are used as the measurement tools to determine the best fitted model. Inappropriate selection of SVM model parameters can result in either over or under fitting the training data. As with the ARIMA model, the parameter sets of the SVMs model with the lowest MSE value will be selected for use in the best fitting model. Thus, for the hybrid models, first the ARIMA worked as a pre-processor to filter the linear pattern of data sets. Then, the error term generated from the ARIMA model will be fed into the SVM in the hybrid models. The SVMs were performed to reduce the error function from the ARIMA.

Application of the hybrid model to daily cases of COVID-19 in Malaysia
This section analysed the performance of the proposed model in respect to two aspects: (1) the performance of the proposed models against ARIMA and SVM models, and (2) the percentage improvement of the proposed models against ARIMA and SVM models. Since the World Health Organisation (WHO) was declared that COVID-19 is pandemic worldwide, the COVID-19 time series data sets have been widely studied. Next, the predictive capability of the developed novel models was compared using three well-known data sets of daily cases of COVID-19 in Malaysia-daily new positive cases data, daily new fatalities cases data and daily new recovered cases data-used to demonstrate the performance of the proposed model in terms of accuracy, effectively and accurately. All these data are reported from the 1 st of October 2020 to 4 th of November 2022 and retrieved from the COVIDNOW website at https:// covidnow.moh.gov.my/ In the Table 1 Table 2. The estimates of all parameters are shown in Table 3. From this table, it can be observed that the p-values of all parameters are small. Therefore, the models were statistically significant for confirmed, recovered, and death cases, and could be used to forecast the future [33,35].
Part II (Nonlinear Modelling)-In order to obtain an optimal machine learning algorithm, based on the concepts of support vector machine design and using pruning algorithms in Rprogramming software. For the daily new positive COVID-19 cases datasets, parameters γ = 2, C = 256, ε = 0.2 shows the smallest values of MSE i.e., 10321275 (see Table 4). Therefore, this parameters value was selected for use in the best-fitting model for the datasets of daily new positive COVID-19 cases. Whereas the smallest value of MSE is 1431.732 and 9885746 (Table 4), with parameters γ = 2, C = 256, ε = 0.2 are selected as the best-fitting model for daily new death cases of COVID-19 and daily new recovered cases of COVID-19, respectively. In addition, this section also discusses the process of proposed models at once for both part i.e., Part I (Linear modelling) and Part II (Nonlinear Modelling) using three well-known data sets of COVID-19 i.e., daily new positive cases, daily new deaths cases and daily new recovered cases are discussed in order to demonstrate the effectiveness of the proposal models. Both linear and nonlinear modelling as well as well as the data used in this study are executed through programming using the R-language. https://doi.org/10.1371/journal.pone.0285407.t001

New positive cases data forecasts
The daily new positive cases datasets series is recoded from the 1 st of October 2020 to 4 th of November 2022 (see Fig 2) contains 765 data points. The number of daily new positive cases of COVID-19 in Malaysia continued to show a significant increase starting in July 2021 dropped below 5,000 new cases. However, it's continued an increased again around March-April 2022 to the maximum of 33,406.00. But this number showed a drastic decrease until November 4, 2022. The daily new positive cases of COVID-19 datasets, which is consider in this investigation and the COVID-19 datasets also have been extensively used with a vast variety of linear and nonlinear time series models including ARIMA, ANN and machine learning methods [8-10, 12, 14, 17, 20-26, 34]. The study of the daily new positive cases of COVID-19 has crucial as an indication of the effectiveness of preventive measures that have been, are being and will be taken by the authorities in controlling the spread of this epidemic more effectively. Therefore, to investigate the performance of the proposal models on daily new positive cases of COVID-19 datasets, which is similar approach by Aisyah et al., [16] is used where the dataset is divided into two samples, known as training sample and testing sample. According to Aisyah et al., [16] and Nurul Hila et al., [17], the datasets should be divided into two (2) which are 70-80% the data for training and the remaining 20-30% for testing yields the greatest outcomes [36,37]. The training data are used to assemble the models while testing data is used to evaluate based on the statistical measurement the forecasting performances of the models. Thus, in this study the daily new positive cases of COVID-19 data set are divided into two samples which the training data set and test data set. For training data sets consists of 612 observations from day 1 to day 612, which is 80% of the data sets from October 1 st , 2020, to June 4 th , 2022, exclusively used to formulate. The test sample data sets used about 153 observations from days 613-765 (20%) for the period of 5 th June 2022-4 th November 2022 in order to evaluate the forecasting performance of proposed models.
The performance of the proposed model of the daily new positive COVID-19 cases datasets are shown in Table 5. The results were obtained from the proposed models in terms of     (Tables 4-6 and Figs 3-7), it can be concluded that the proposed model that has been developed has produced higher accuracy as well as efficiency compared to results achieved by ARIMA and SVM

New deaths cases data forecasts
Besides the Malaysian daily new positive COVID-19 cases datasets, the Malaysian daily new deaths cases datasets are also considered and used to analyse the performance of the proposed

PLOS ONE
Future pandemic applications based on COVID-19 data 2020-2022 models. Similar to the daily new positive data set as well as the daily new death case data set, the recording period of this data set from 1 st of October 2020 to 4 th of November 2022 (see Fig  8) contains 765 data points and is divided into two samples. As a result of the increase in the number of daily positive cases of COVID-19 reported, this also shows that there is a significant increase in the number of deaths around 600. In order to formulate the model, the training The performance of the proposed models using the daily new deaths COVID-19 cases datasets is first characterized by statistical measurement such as the MSE, MAPE, RMSE and MAE as shown in Table 7. The results for the training data from this table show that the proposed model gives the smallest values of 49.4459 and 3.53812 for MSE and MAE values, respectively, compared to ARIMA and SVM for MSE and MAE values, respectively, compared to ARIMA and SVM. The same trend also occurs on the test data where all the values of the statistical measures used show the smallest values compared to the ARIMA and SVM models.
The study continues by investigating the estimated value of the proposed model for the daily new death COVID-19 case data set as illustrated in Fig 8. This figure clearly indicates that the proposed model line is almost no difference with the actual data. In addition, the estimated values of ARIMA, SVM and proposed models for test sample are plotted in Figs 9-11, respectively. Again, it clearly shows that our proposed model's lines (Fig 12) for test sample are relatively closed to actual data compared to ARIMA and SVM models. This shows that the results of our proposed model are consistent with previous findings, which are efficient, accurate and precise compared to ARIMA and SVM models. In addition, as in Fig 12, the number  Tables 7,8 and Figs 9 -11, 13) clearly conclude that our proposed model has produced efficiently and accurately as well compared to ARIMA and ASV models.

New recovered cases data forecasts
The last dataset considered in this investigation to study the performance of the proposed model, is the dataset of new daily recovered cases of COVID-19 in Malaysia. Predicting shown by the number of patients recovered from COVID-19 where there is a significant increase twice. Starting in July 2021, the number of recovered patients also shows an exponential increase until it reaches over 22,500.00 in August 2021 (the time series plot is given in Fig  14) and drop. However, around March-April 2022, the number of recovered COVID-19 cases increased again until a maximum of 33,872.00 and then decreased and it showed a relatively stable movement after that. This dataset also divided into two samples, i.e., the training data set and test data set. Like the previous datasets, training data set is implemented in order to formulate the model, which involved 612 observations (80%) from 1 st October 2020-4 th October 2022. Whereas, to evaluate the forecasting performance of the proposed model, the test sample uses approximately 153 observations (20%) for the period 5 June 2022-November 2022. Table 9 presented the performance of the proposed model of the daily new recovered COVID-19 cases datasets based on training sample and test sample. The results in Table 9 clearly show that the proposed training sample model produces the smallest MSE and MAE values with 99205.699 and 136.8519, respectively compared to the MSE and MAE models of the ARIMA model and the SVM model. For the test sample also revealed that the same scenario as the training sample ie, produced the smallest MSE, MAPE, RMSE and MAE with values of 26108.02, 0.0396, 161.5797 and 104.1002, respectively compared to ARIMA and SVM as well.
Meanwhile, the estimated value for the test sample of the proposed model for the dataset of daily new COVID-19 cases is depicted in Fig 15. Again, this figure clearly shows that the predicted value from the proposed models appear to be close to the actual values. A further

PLOS ONE
Future pandemic applications based on COVID-19 data 2020-2022 forthcoming three weeks and indicates that daily new recovered COVID-19 cases would increase in the upcoming days in Malaysia. The performance of the proposed models for the daily new recovered COVID-19 cases datasets was further investigated for MSE, MAPE, RMSE and MAE in terms of the percentage, as reported in Table 10. By looking at the percentage of improvement for statistical measurements such as MSE, MAPE, RMSE and MAE, the results observed for the proposed model show a better improvement compared to ARIMA and SVM, respectively, with results of 73.12%, 74.62%, 90.38% and 68.99% improvement (71.99%, 73.67%, 89.11% and 66.99%) (where the results reported in the parenthesis are the SVM model). Therefore, based on the results, it can be concluded that the proposed model that has been developed has produced higher accuracy and efficiency compared to the results achieved by ARIMA and SVM models.

Conclusion
Accuracy and efficiency in predicting the spread of COVID-19 is crucial but often difficult for decision makers, especially the frontline and authorities. Although the spread of COVID-19 seems to be endless, but many efforts in the development of time series models, research to improve the effectiveness of forecasting models has never stopped. Among them is the hybrid approach and one of the most popular categories of hybrid models that decompose time series into linear and non-linear forms. in this study, a hybrid model as a combination of predictions produced by linear and some non-linear is proposed. The proposed model was investigated using three well-known COVID-19 data sets, namely, daily new positive cases, daily new death

Limitations and future recommendation
An effort was made in this research study to forecast the total number of confirmed cases, fatalities, and recoveries of COVID-19 in Malaysia. Nowadays, the change in daily numbers of COVID-19 is affected by a very large number of factors, such as the population's adherence to prevention measures, vaccination, social isolation, and new variants of the virus. As such, in order to improve future predictions and forecasts, it is imperative that the study of COVID-19 be taken into consideration in terms of (i) the clinical and behavioural aspects, and (ii) the possibility of underreporting cases, deaths, or delays in notifying as part of the study of COVID-19 in the future. Besides that, to improve the accuracy of the forecast in future work, investigation in SVM performance with different kernel functions and optimal hyper parameters of SVM forecasting model can be developed. Next, multi-step forecasts can be centralized in the future work since only one-step-ahead forecasting is considered in this paper. It is proven that multi-step forecasts can make the trading system much more realistic [38]. Finally, another approach, such as bootstrapping, can also be added as a hybridization of ARIMA and SVM [39]. Bootstrap is a reliable method given the lack of researchers adding this method in daily cases of COVID-19 forecasting. Many studies have shown that the bootstrap resampling technique provides a more accurate estimation [17,[40][41][42].